Extending Q-Learning to General Adaptive Multi-Agent Systems

نویسنده

  • Gerald Tesauro
چکیده

Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...

متن کامل

Optimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics

In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...

متن کامل

Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems

This paper describes how multi-agent system technology can be used as the underpinning platform for voltage control in power systems. In this study, some FACTS (flexible AC transmission systems) devices are properly designed to coordinate their decisions and actions in order to provide a coordinated secondary voltage control mechanism based on multi-agent theory. Each device here is modeled as ...

متن کامل

Adaptive Neural Network Method for Consensus Tracking of High-Order Mimo Nonlinear Multi-Agent Systems

This paper is concerned with the consensus tracking problem of high order MIMO nonlinear multi-agent systems. The agents must follow a leader node in presence of unknown dynamics and uncertain external disturbances. The communication network topology of agents is assumed to be a fixed undirected graph. A distributed adaptive control method is proposed to solve the consensus problem utilizing re...

متن کامل

User-based Vehicle Route Guidance in Urban Networks Based on Intelligent Multi Agents Systems and the ANT-Q Algorithm

Guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of Intelligent Transportation Systems (ITS). Nowadays, many complex systems can be controlled by using multi agent systems. Adaptation with the current condition is an important feature of the agents. In this research, formulation of dynamic guidance for vehicles has been investigated based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003